{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3.4. 将课程内容转为音频\n",
    "\n",
    "## 🚄 前言\n",
    "音频内容可以随时随地播放，让学习变得更加灵活和便捷。本节课程介绍如何借助文本生成模型 Qwen-Max、语音合成模型 CosyVoice 和视频编辑和处理 moviepy，将课程内容快速转换为音频，并生成对应的字幕。\n",
    "\n",
    "## 🍁 课程目标\n",
    "学完本课程后，你将能够：\n",
    "- 使用 Qwen-Max 优化录音稿\n",
    "- 了解 CosyVoice，并使用 CosyVoice 合成音频\n",
    "- 了解 moviepy，并使用 moviepy 生成字幕\n",
    "\n",
    "\n",
    "## 📖 课程目录\n",
    "\n",
    "- [1. 原理介绍](#🧮-1-原理介绍)\n",
    "- [2. 代码实践](#🛠️-2-代码实践)\n",
    "    - [2.1. 环境准备](#21-环境准备)\n",
    "    - [2.2. 设置 API 客户端](#22-设置-api-客户端)\n",
    "    - [2.3. 将课程内容转换为录音稿](#23-将课程内容转换为录音稿)\n",
    "    - [2.4. 转换为音频](#24-转换为音频)\n",
    "    - [2.5. 生成字幕](#25-生成字幕)\n",
    "  \n",
    "\n",
    "## 🧮 1. 原理介绍\n",
    "\n",
    "除了之前用到的 Qwen-Max， 本次课程你将用到以下模型和工具：\n",
    "\n",
    "* [CosyVoice](https://bailian.console.aliyun.com/#/model-market/detail/cosyvoice-v1)：CosyVoice 是通义实验室依托大规模预训练语言模型，深度融合文本理解和语音生成的新一代生成式语音合成大模型，支持文本至语音的实时流式合成。\n",
    "\n",
    "* [moviepy](https://zulko.github.io/moviepy/)：一个 Python 库，用于视频编辑和处理。它提供了许多方便的功能，可以帮助开发者创建、修改和合成视频文件。\n",
    "\n",
    "使用 CosyVoice 和 moviepy 将课程内容转换为音频的过程如下：\n",
    "\n",
    "<div align=\"center\">\n",
    "    <img src=\"https://gw.alicdn.com/imgextra/i2/O1CN01CW5UUR22TvmMQn2Mm_!!6000000007122-0-tps-2386-178.jpg\" alt=\"流程\" width=\"90%\"/>\n",
    "</div>\n",
    "\n",
    "\n",
    "## 🛠️ 2. 代码实践\n",
    "\n",
    "接下来，让我们执行以下代码，将第一节课生成的内容转换为音频，并生成字幕。\n",
    "\n",
    "### 2.1. 环境准备\n",
    "\n",
    "1. 安装 Python 库。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2024-10-10T02:01:59.195554Z",
     "iopub.status.busy": "2024-10-10T02:01:59.195205Z",
     "iopub.status.idle": "2024-10-10T02:02:03.526547Z",
     "shell.execute_reply": "2024-10-10T02:02:03.525907Z",
     "shell.execute_reply.started": "2024-10-10T02:01:59.195534Z"
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "! pip install -r requirements.txt -q"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2. 导入必要的模块。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2024-10-10T02:02:03.527838Z",
     "iopub.status.busy": "2024-10-10T02:02:03.527584Z",
     "iopub.status.idle": "2024-10-10T02:02:05.235311Z",
     "shell.execute_reply": "2024-10-10T02:02:05.234790Z",
     "shell.execute_reply.started": "2024-10-10T02:02:03.527819Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import openai\n",
    "from dashscope.audio.tts_v2 import SpeechSynthesizer\n",
    "import json\n",
    "import re\n",
    "import time\n",
    "import traceback\n",
    "import dashscope\n",
    "from moviepy.editor import concatenate_audioclips, AudioFileClip\n",
    "from typing import List\n",
    "from utils import create_directory, save_file, read_text_from_file,load_config"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3. 设置环境变量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "sys.path.append(\"../\")\n",
    "from config.load_key import load_key\n",
    "load_key()\n",
    "print(os.environ[\"DASHSCOPE_API_KEY\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "4. 加载配置文件。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2024-10-10T02:43:34.747005Z",
     "iopub.status.busy": "2024-10-10T02:43:34.746559Z",
     "iopub.status.idle": "2024-10-10T02:43:34.752525Z",
     "shell.execute_reply": "2024-10-10T02:43:34.751899Z",
     "shell.execute_reply.started": "2024-10-10T02:43:34.746974Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "project_config = load_config(\"config.json\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "### 2.2. 设置 API 客户端\n",
    "设置 OpenAI 的 API 客户端，用于后续调用阿里云百炼的 Qwen-Max 模型和 Flux-Merged 模型。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2024-10-10T02:02:08.294379Z",
     "iopub.status.busy": "2024-10-10T02:02:08.294014Z",
     "iopub.status.idle": "2024-10-10T02:02:08.340161Z",
     "shell.execute_reply": "2024-10-10T02:02:08.339657Z",
     "shell.execute_reply.started": "2024-10-10T02:02:08.294358Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "client = openai.OpenAI(\n",
    "    api_key=os.getenv(\"DASHSCOPE_API_KEY\"),\n",
    "    base_url=\"https://dashscope.aliyuncs.com/compatible-mode/v1\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "### 2.3. 将课程内容转换为录音稿\n",
    "\n",
    "首先，我们将 Markdown 格式的课程内容转换为 JSON 格式，然后使用 Qwen-Max 对录音稿中的内容进行优化，便于之后进行语音合成。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. 定义一个 `create_speech_script` 函数，用于将课程内容转换为录音稿。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecutionIndicator": {
     "show": false
    },
    "execution": {
     "iopub.execute_input": "2024-10-10T02:12:14.679112Z",
     "iopub.status.busy": "2024-10-10T02:12:14.678661Z",
     "iopub.status.idle": "2024-10-10T02:12:14.687155Z",
     "shell.execute_reply": "2024-10-10T02:12:14.686471Z",
     "shell.execute_reply.started": "2024-10-10T02:12:14.679080Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def create_speech_script(course_script):\n",
    "    \"\"\"\n",
    "    创建演讲稿脚本，从给定的课程脚本中提取标题和内容。\n",
    "\n",
    "    该函数将输入的课程脚本按行拆分，并提取其中的标题和内容，\n",
    "    形成一个结构化的输出，包含标题及其对应的内容（去除 Markdown 格式）。\n",
    "\n",
    "    参数:\n",
    "    course_script (str): 包含课程内容的 Markdown 格式字符串。\n",
    "\n",
    "    返回:\n",
    "    str: 返回一个 JSON 格式的字符串，包含所有标题和内容。\n",
    "    \"\"\"\n",
    "    output = []  # 用于保存输出的最终结果\n",
    "    lines = course_script.splitlines()  # 将脚本按行拆分\n",
    "    current_title = None  # 当前标题\n",
    "    current_content = []  # 当前内容列表\n",
    "\n",
    "    for line in lines:\n",
    "        line = line.strip()  # 去除每行的首尾空白字符\n",
    "\n",
    "        # 处理图片标签，去除 Markdown 图片格式\n",
    "        if line.startswith('!['):\n",
    "            continue  # 跳过图片行，直接进入下一个循环\n",
    "\n",
    "        # 处理一级标题\n",
    "        if line.startswith('# '):  # 检测到一级标题\n",
    "            if current_title:  # 如果已有标题，保存当前内容\n",
    "                output.append({\n",
    "                    \"title\": current_title,\n",
    "                    \"content\": \"\\n\".join(current_content).replace('*', '').replace('**', '')\n",
    "                })\n",
    "                current_content = []  # 重置内容\n",
    "            current_title = line[2:]  # 获取当前标题的文本内容（跳过 '# '）\n",
    "\n",
    "        # 处理二级标题\n",
    "        elif line.startswith('## '):  # 检测到二级标题\n",
    "            if current_title:  # 如果已有标题，保存当前内容\n",
    "                output.append({\n",
    "                    \"title\": current_title,\n",
    "                    \"content\": \"\\n\".join(current_content).replace('*', '').replace('**', '')\n",
    "                })\n",
    "                current_content = []  # 重置内容\n",
    "            current_title = line[3:]  # 获取当前标题的文本内容（跳过 '## '）\n",
    "\n",
    "        else:\n",
    "            # 处理列表和其他 Markdown 内容，移除多余的符号\n",
    "            clean_line = re.sub(r'\\*\\s*', '', line)  # 移除以 '*' 开头的内容\n",
    "            clean_line = re.sub(r'#+\\s*', '', clean_line)  # 移除 Markdown 标题格式\n",
    "            clean_line = clean_line.strip()  # 去除多余的空白字符\n",
    "\n",
    "            if clean_line:  # 确保不添加空行\n",
    "                current_content.append(clean_line)  # 将清理后的内容添加到当前内容列表\n",
    "\n",
    "    # 辐忘记添加最后一个标题与内容\n",
    "    if current_title:\n",
    "        output.append({\n",
    "            \"title\": current_title,\n",
    "            \"content\": \"\\n\".join(current_content).replace('*', '').replace('**', '')\n",
    "        })\n",
    "\n",
    "    # 将输出转换为 JSON 格式并返回\n",
    "    generated_content = json.dumps(output, ensure_ascii=False, indent=4)\n",
    "\n",
    "    # 打印生成的录音稿初稿\n",
    "    print(\"生成的录音稿初稿：\" + generated_content)\n",
    "\n",
    "    return generated_content"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2. 调用 `create_speech_script` 函数将课程内容转换为录音稿。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2024-10-10T02:12:22.489408Z",
     "iopub.status.busy": "2024-10-10T02:12:22.489062Z",
     "iopub.status.idle": "2024-10-10T02:12:22.493893Z",
     "shell.execute_reply": "2024-10-10T02:12:22.493372Z",
     "shell.execute_reply.started": "2024-10-10T02:12:22.489389Z"
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# 读取配置文件中的课程脚本文件路径\n",
    "course_script_with_illustrations_file_path = project_config[\"course_script_with_illustrations_file_path\"].format(title=project_config[\"title\"])\n",
    "\n",
    "# 根据课程脚本文件路径读取课程内容\n",
    "script = read_text_from_file(course_script_with_illustrations_file_path)\n",
    "\n",
    "# 调用函数生成录音稿初稿\n",
    "speech_script_draft = create_speech_script(script)\n",
    "\n",
    "# 读取配置文件中的录音稿初稿文件路径\n",
    "speech_script_draft_file_path = project_config[\"speech_script_draft_file_path\"].format(title=project_config[\"title\"])\n",
    "\n",
    "# 保存录音稿初稿文件\n",
    "save_file(speech_script_draft, speech_script_draft_file_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3. 定义一个 `improve_speech_script` 函数，用于优化录音稿中的内容。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2024-10-10T02:46:30.813386Z",
     "iopub.status.busy": "2024-10-10T02:46:30.812995Z",
     "iopub.status.idle": "2024-10-10T02:46:30.818765Z",
     "shell.execute_reply": "2024-10-10T02:46:30.818177Z",
     "shell.execute_reply.started": "2024-10-10T02:46:30.813364Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def improve_speech_script(script):\n",
    "    \"\"\"\n",
    "    优化录音稿的内容，使其更适合口语表达。\n",
    "\n",
    "    此函数调用外部 API，对输入的录音稿 JSON 格式的内容进行处理，\n",
    "    生成更适合发言的纯文本内容。\n",
    "\n",
    "    参数:\n",
    "    script (str): JSON 格式的字符串，包含待优化的内容。\n",
    "\n",
    "    返回:\n",
    "    str: 生成的优化后的口语化文本。\n",
    "    \"\"\"\n",
    "\n",
    "    # 系统消息，设定机器人的角色\n",
    "    system_message = \"您是录音稿专家。\"\n",
    "\n",
    "    # 提示词创建，构建用于 API 请求的 prompt 字符串\n",
    "    prompt = (\n",
    "        f\"处理以下 JSON 中的 content 字段，并将内容转换为适合录音的纯文本形式。\"\n",
    "        f\"返回处理后的 JSON，不要任何额外的说明。内容格式要求：\\n\"\n",
    "        \"1. 对于英文的专有术语缩写，替换为全称。\\n\"  # 英文缩写的全称替换\n",
    "        \"2. 去除星号、井号等 Markdown 格式。\\n\"  # 移除 Markdown 标记\n",
    "        \"3. 去除换行符和段落分隔。\\n\"  # 删除换行符和段落分割符\n",
    "        \"4. 对于复杂的长难句，使用中文句号分割，便于口语表达。\\n\"  # 处理长难句使其更流畅\n",
    "        \" content 中的内容使用于发言使用。\\n\"  # 明确内容用途\n",
    "        f\"{script} \"  # 添加待处理的 JSON 内容\n",
    "        \"输出格式为 JSON。不包含任何额外的文字、解释或评论。\"  # 指明输出格式\n",
    "    )\n",
    "\n",
    "    # 调用 API 获取插图数据\n",
    "    # 使用预设的系统消息和用户提示词来生成回复\n",
    "    completion = client.chat.completions.create(\n",
    "        model=\"qwen-max-latest\",  # 指定使用的模型\n",
    "        messages=[\n",
    "            {\"role\": \"system\", \"content\": system_message},  # 系统角色消息\n",
    "            {\"role\": \"user\", \"content\": prompt},  # 用户请求部分\n",
    "        ],\n",
    "    )\n",
    "\n",
    "    # 解析 API 返回的 JSON 格式结果\n",
    "    dumped_json = json.loads(completion.model_dump_json())  # 从响应中加载 JSON 数据\n",
    "\n",
    "    # 返回生成的内容，提取优化后的文本\n",
    "    generated_content = dumped_json['choices'][0]['message']['content']\n",
    "\n",
    "    # 打印生成的优化后的录音稿\n",
    "    print(\"生成的优化后的录音稿：\" + generated_content)\n",
    "\n",
    "    return generated_content  # 返回优化后的文本"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "4. 调用 `improve_speech_script` 函数优化录音稿中的内容。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2024-10-10T02:55:12.757442Z",
     "iopub.status.busy": "2024-10-10T02:55:12.757125Z",
     "iopub.status.idle": "2024-10-10T02:55:51.669041Z",
     "shell.execute_reply": "2024-10-10T02:55:51.668497Z",
     "shell.execute_reply.started": "2024-10-10T02:55:12.757425Z"
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# 调用函数优化录音稿初稿\n",
    "improved_speech_script = improve_speech_script(speech_script_draft)\n",
    "\n",
    "# 读取配置文件中的优化后的录音稿文件路径\n",
    "speech_script_file_path = project_config[\"speech_script_file_path\"].format(title=project_config[\"title\"])\n",
    "\n",
    "# 保存优化后的录音稿文件\n",
    "save_file(improved_speech_script, speech_script_file_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "### 2.4. 转换为音频\n",
    "\n",
    "接下来，我们使用语音合成模型 CosyVoice 将录音稿转换为音频。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. 定义一个 `read_json_file` 函数，用于从指定路径读取 JSON 文件并返回内容。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-10T02:59:39.677521Z",
     "iopub.status.busy": "2024-10-10T02:59:39.677193Z",
     "iopub.status.idle": "2024-10-10T02:59:39.680641Z",
     "shell.execute_reply": "2024-10-10T02:59:39.680056Z",
     "shell.execute_reply.started": "2024-10-10T02:59:39.677502Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def read_json_file(file_path):\n",
    "    with open(file_path, 'r', encoding='utf-8') as file:\n",
    "        data = json.load(file)\n",
    "    return data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2. 定义一个 `split_into_sentences` 函数，用于将输入文本按中文标点和括号分割成句子。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2024-10-10T02:59:40.732266Z",
     "iopub.status.busy": "2024-10-10T02:59:40.731918Z",
     "iopub.status.idle": "2024-10-10T02:59:40.737481Z",
     "shell.execute_reply": "2024-10-10T02:59:40.736916Z",
     "shell.execute_reply.started": "2024-10-10T02:59:40.732248Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def split_into_sentences(text):\n",
    "    # 中文标点符号列表\n",
    "    punctuation = ['，', '。', '；', '？', '！']\n",
    "    brackets = {'(': ')', '[': ']', '{': '}', '（': '）', '【': '】', '《': '》'}\n",
    "\n",
    "    # 初始化结果列表和临时句子存储\n",
    "    sentences = []\n",
    "    temp_sentence = ''\n",
    "    bracket_stack = []\n",
    "\n",
    "    # 遍历文本中的每一个字符\n",
    "    for char in text:\n",
    "        # 如果是左括号，压入栈\n",
    "        if char in brackets:\n",
    "            bracket_stack.append(char)\n",
    "        # 如果是右括号且与栈顶匹配，弹出栈\n",
    "        elif char in brackets.values() and bracket_stack and brackets[bracket_stack[-1]] == char:\n",
    "            bracket_stack.pop()\n",
    "\n",
    "        # 如果字符是中文标点之一且括号栈为空，表示句子结束\n",
    "        if char in punctuation and not bracket_stack:\n",
    "            # 添加临时句子到结果列表，并清空临时句子\n",
    "            sentences.append(temp_sentence.strip())\n",
    "            temp_sentence = ''\n",
    "        # 如果字符是空格，也可以视为句子结束\n",
    "        elif char == ' ':\n",
    "            # 如果临时句子不是空，将其添加到结果列表\n",
    "            if temp_sentence.strip():  # 仅在临时句子不为空时添加\n",
    "                sentences.append(temp_sentence.strip())\n",
    "                temp_sentence = ''\n",
    "        else:\n",
    "            # 否则，将字符添加到临时句子中\n",
    "            temp_sentence += char\n",
    "\n",
    "    # 处理最后一个可能没有标点结尾的句子\n",
    "    if temp_sentence:\n",
    "        sentences.append(temp_sentence.strip())\n",
    "\n",
    "    return sentences"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3. 定义一个 `save_sentences_to_markdown` 函数，用于将分割后的句子保存为 Markdown 文件。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-10T02:59:42.247271Z",
     "iopub.status.busy": "2024-10-10T02:59:42.246929Z",
     "iopub.status.idle": "2024-10-10T02:59:42.251057Z",
     "shell.execute_reply": "2024-10-10T02:59:42.250565Z",
     "shell.execute_reply.started": "2024-10-10T02:59:42.247252Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def save_sentences_to_markdown(sentences, base_dir, index1):\n",
    "    for index2, sentence in enumerate(sentences, start=1):\n",
    "        # 创建目录\n",
    "        dir_name = f'audio_for_paragraph_{index1}'\n",
    "        dir_path = os.path.join(base_dir, dir_name)\n",
    "        os.makedirs(dir_path, exist_ok=True)\n",
    "\n",
    "        # 构建文件名\n",
    "        file_name = f'paragraph_{index1}_sentence_{index2}.md'\n",
    "        file_path = os.path.join(dir_path, file_name)\n",
    "\n",
    "        # 写入Markdown文件\n",
    "        with open(file_path, 'w', encoding='utf-8') as file:\n",
    "            file.write(sentence + '\\n')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "4 定义一个 `process_json_file` 函数，用于处理指定的 JSON 文件并生成 Markdown 文件。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-10T02:59:43.127388Z",
     "iopub.status.busy": "2024-10-10T02:59:43.127045Z",
     "iopub.status.idle": "2024-10-10T02:59:43.131449Z",
     "shell.execute_reply": "2024-10-10T02:59:43.130892Z",
     "shell.execute_reply.started": "2024-10-10T02:59:43.127370Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def process_json_file(json_file_path, base_dir):\n",
    "\n",
    "    if not os.path.exists(base_dir):\n",
    "        os.makedirs(base_dir)\n",
    "\n",
    "    file_prefix = os.path.splitext(os.path.basename(json_file_path))[0]\n",
    "\n",
    "    # base_dir = os.path.join(base_dir, file_prefix)\n",
    "\n",
    "    # 读取JSON文件\n",
    "    json_data = read_json_file(json_file_path)\n",
    "\n",
    "    # 处理JSON数据中的每个条目\n",
    "    for index1, item in enumerate(json_data):\n",
    "        if 'content' in item:\n",
    "            content = item['content']\n",
    "            # 检查content是否为链接\n",
    "            if not is_url(content):\n",
    "                sentences = split_into_sentences(content)\n",
    "                save_sentences_to_markdown(sentences, base_dir, index1+1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "5. 定义一个 `is_url` 函数，用于检查给定字符串是否为有效 URL。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-10T02:59:44.864888Z",
     "iopub.status.busy": "2024-10-10T02:59:44.864530Z",
     "iopub.status.idle": "2024-10-10T02:59:44.868106Z",
     "shell.execute_reply": "2024-10-10T02:59:44.867499Z",
     "shell.execute_reply.started": "2024-10-10T02:59:44.864868Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def is_url(s):\n",
    "    url_pattern = re.compile(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\\\(\\\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+')\n",
    "    return bool(url_pattern.match(s))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "6. 定义一个 `synthesize_md_to_speech` 函数，用于将指定目录下的所有 Markdown 文件内容转换为语音并保存为 MP3 文件。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-10T02:59:46.002460Z",
     "iopub.status.busy": "2024-10-10T02:59:46.002020Z",
     "iopub.status.idle": "2024-10-10T02:59:46.007519Z",
     "shell.execute_reply": "2024-10-10T02:59:46.006994Z",
     "shell.execute_reply.started": "2024-10-10T02:59:46.002424Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def synthesize_md_to_speech(base_directory):\n",
    "    \"\"\"\n",
    "    识别指定目录下的所有.md文件，读取其内容并使用DashScope API将其转换为语音，\n",
    "    保存为同名.mp3文件在同一目录下。\n",
    "\n",
    "    参数:\n",
    "    base_directory (str): 包含.md文件的顶层目录路径。\n",
    "    \"\"\"\n",
    "    # 确保环境变量中存在DashScope API密钥\n",
    "    if 'DASHSCOPE_API_KEY' not in os.environ:\n",
    "        raise ValueError(\"DashScope API key must be set in the environment variables.\")\n",
    "\n",
    "    # 遍历指定目录及其子目录\n",
    "    for root, dirs, files in os.walk(base_directory):\n",
    "        for file in files:\n",
    "            if file.endswith('.md'):\n",
    "                # 构建完整文件路径\n",
    "                md_file_path = os.path.join(root, file)\n",
    "\n",
    "                # 读取.md文件内容\n",
    "                with open(md_file_path, 'r', encoding='utf-8') as f:\n",
    "                    text = f.read()\n",
    "\n",
    "                # 初始化语音合成器\n",
    "                speech_synthesizer = SpeechSynthesizer(model='cosyvoice-v1', voice='longxiaochun')\n",
    "\n",
    "\n",
    "                # 合成语音\n",
    "                audio_data = speech_synthesizer.call(text)\n",
    "\n",
    "                # 构建输出.mp3文件路径\n",
    "                mp3_file_path = os.path.splitext(md_file_path)[0] + '.mp3'\n",
    "\n",
    "                # 保存音频到文件\n",
    "                with open(mp3_file_path, 'wb') as f:\n",
    "                    f.write(audio_data)\n",
    "\n",
    "                print(f'Synthesized text from file \"{md_file_path}\" to file: {mp3_file_path}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "7. 调用 `process_json_file` 切分录音稿，然后调用 `synthesize_md_to_speech` 函数将录音稿片段转换为语音。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2024-10-10T03:01:51.354789Z",
     "iopub.status.busy": "2024-10-10T03:01:51.354437Z",
     "iopub.status.idle": "2024-10-10T03:02:42.889839Z",
     "shell.execute_reply": "2024-10-10T03:02:42.889277Z",
     "shell.execute_reply.started": "2024-10-10T03:01:51.354769Z"
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# 读取配置文件中的音频文件所在目录\n",
    "audio_file_folder = project_config[\"audio_file_folder\"].format(title=project_config[\"title\"])\n",
    "\n",
    "# 切分录音稿\n",
    "process_json_file(speech_script_file_path, audio_file_folder)\n",
    "\n",
    "# 将录音稿片段转换为语音\n",
    "synthesize_md_to_speech(audio_file_folder)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.5. 生成字幕"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "最后，我们基于音频的时长和录音稿的文本生成音频的字幕。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. 定义一个 `format_time` 函数，用于将给定的时间（以秒为单位）格式化为“时:分:秒,毫秒”的字符串表示。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-10T02:37:38.335081Z",
     "iopub.status.busy": "2024-10-10T02:37:38.334738Z",
     "iopub.status.idle": "2024-10-10T02:37:38.338518Z",
     "shell.execute_reply": "2024-10-10T02:37:38.337898Z",
     "shell.execute_reply.started": "2024-10-10T02:37:38.335062Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def format_time(seconds):\n",
    "    hours, remainder = divmod(seconds, 3600)\n",
    "    minutes, seconds = divmod(remainder, 60)\n",
    "    milliseconds = int((seconds - int(seconds)) * 1000)\n",
    "    seconds = int(seconds)\n",
    "    return f\"{int(hours):02d}:{int(minutes):02d}:{int(seconds):02d},{milliseconds:03d}\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2. 定义一个 `get_audio_duration` 函数，用于获取音频文件的时长。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-10T02:37:39.326830Z",
     "iopub.status.busy": "2024-10-10T02:37:39.326486Z",
     "iopub.status.idle": "2024-10-10T02:37:39.330068Z",
     "shell.execute_reply": "2024-10-10T02:37:39.329362Z",
     "shell.execute_reply.started": "2024-10-10T02:37:39.326811Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def get_audio_duration(file_path):\n",
    "    audio = AudioFileClip(file_path)\n",
    "    duration = audio.duration\n",
    "    audio.close()\n",
    "    return duration"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3. 定义一个 `create_srt_line` 函数，用于生成SRT格式的字幕行。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-10-10T02:37:40.211867Z",
     "iopub.status.busy": "2024-10-10T02:37:40.211543Z",
     "iopub.status.idle": "2024-10-10T02:37:40.215049Z",
     "shell.execute_reply": "2024-10-10T02:37:40.214289Z",
     "shell.execute_reply.started": "2024-10-10T02:37:40.211847Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def create_srt_line(index, start_time, end_time, text):\n",
    "    return f\"{index}\\n{start_time} --> {end_time}\\n{text}\\n\\n\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "4. 定义一个 `generate_srt_from_audio` 函数，用于生成字幕。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2024-10-10T02:51:36.553084Z",
     "iopub.status.busy": "2024-10-10T02:51:36.552747Z",
     "iopub.status.idle": "2024-10-10T02:51:36.560396Z",
     "shell.execute_reply": "2024-10-10T02:51:36.559902Z",
     "shell.execute_reply.started": "2024-10-10T02:51:36.553064Z"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def generate_srt_from_audio(base_dir: str, output_dir: str, output_srt_file: str) -> None:\n",
    "    \"\"\"\n",
    "    从指定目录下的音频文件夹生成SRT字幕文件。\n",
    "\n",
    "    :param base_dir: 包含音频文件夹的根目录。\n",
    "    :param output_dir: 输出SRT文件的目录。\n",
    "    :param output_srt_file: 输出SRT文件的完整路径。\n",
    "    \"\"\"\n",
    "\n",
    "    # 创建输出目录，如果它不存在\n",
    "    if not os.path.exists(output_dir):\n",
    "        os.makedirs(output_dir)\n",
    "\n",
    "    # 确保输出文件名有.srt后缀\n",
    "    if not output_srt_file.endswith('.srt'):\n",
    "        output_srt_file += '.srt'\n",
    "\n",
    "\n",
    "    # 初始化当前时间\n",
    "    current_time = 2.000  # 初始时间\n",
    "\n",
    "    # 打开SRT文件进行写入\n",
    "    with open(output_srt_file, 'w', encoding='utf-8') as srt_file:\n",
    "        srt_index = 1\n",
    "\n",
    "        # 获取所有符合条件的子目录，并按索引排序\n",
    "        sub_dirs = [d for d in os.listdir(base_dir) if d.startswith('audio_for_paragraph_')]\n",
    "        sub_dirs.sort(key=lambda x: int(re.search(r'\\d+', x).group()))\n",
    "\n",
    "        # 遍历所有子目录\n",
    "        for sub_dir in sub_dirs:\n",
    "            sub_dir_path = os.path.join(base_dir, sub_dir)\n",
    "\n",
    "            # 查找所有的.md和.mp3文件\n",
    "            files = [f for f in os.listdir(sub_dir_path) if f.endswith('.md') or f.endswith('.mp3')]\n",
    "            md_files = [f for f in files if f.endswith('.md')]\n",
    "\n",
    "            # 按照index1和index2排序.md文件\n",
    "            md_files.sort(key=lambda x: (int(x.split('_')[1]), int(x.split('_')[3].split('.')[0])))\n",
    "\n",
    "            # 处理每个.md文件\n",
    "            for md_file in md_files:\n",
    "                md_file_path = os.path.join(sub_dir_path, md_file)\n",
    "                mp3_file_path = os.path.splitext(md_file_path)[0] + '.mp3'\n",
    "\n",
    "                # 确保对应的.mp3文件存在\n",
    "                if os.path.exists(mp3_file_path):\n",
    "                    # 读取.md文件内容\n",
    "                    with open(md_file_path, 'r', encoding='utf-8') as f:\n",
    "                        text = f.read().strip()\n",
    "\n",
    "                    # 获取.mp3文件时长\n",
    "                    duration = get_audio_duration(mp3_file_path)\n",
    "\n",
    "                    # 生成SRT格式的字幕行\n",
    "                    start_time_str = format_time(current_time)\n",
    "                    end_time_str = format_time(current_time + duration)\n",
    "                    srt_line = create_srt_line(srt_index, start_time_str, end_time_str, text)\n",
    "\n",
    "                    # 写入SRT文件\n",
    "                    srt_file.write(srt_line)\n",
    "\n",
    "                    # 更新当前时间\n",
    "                    current_time += duration + 0.3  # 加上0.5秒以避免时间重叠\n",
    "\n",
    "                    srt_index += 1\n",
    "                else:\n",
    "                    print(f\"No corresponding MP3 file found for {md_file}\")\n",
    "\n",
    "    print(\"成功生成字幕文件：\" + output_srt_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "5. 调用 `generate_srt_from_audio` 函数生成一个字幕文件。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "execution": {
     "iopub.execute_input": "2024-10-10T03:05:17.298124Z",
     "iopub.status.busy": "2024-10-10T03:05:17.297780Z",
     "iopub.status.idle": "2024-10-10T03:05:20.878520Z",
     "shell.execute_reply": "2024-10-10T03:05:20.878006Z",
     "shell.execute_reply.started": "2024-10-10T03:05:17.298103Z"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "成功生成字幕文件：./output/srt/初中电子学物理_字幕.srt\n"
     ]
    }
   ],
   "source": [
    "# 读取配置文件中的音频文件所在目录\n",
    "audio_file_folder = project_config[\"audio_file_folder\"].format(title=project_config[\"title\"])\n",
    "\n",
    "# 读取配置文件中的字幕文件所在目录\n",
    "srt_file_folder = project_config[\"srt_file_folder\"]\n",
    "\n",
    "# 读取配置文件中的字幕文件路径\n",
    "srt_file_path = project_config[\"srt_file_path\"].format(title=project_config[\"title\"])\n",
    "\n",
    "# 生成 SRT 文件\n",
    "generate_srt_from_audio(audio_file_folder, srt_file_folder, srt_file_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "## ✅ 本节小结\n",
    "\n",
    "- 在本次学习和实践中，我们了解了 CosyVoice 和 moviepy，并使用它们生成了音频和字幕。\n",
    "- 为了提升课程生动性，我们可以基于已有的图像、文本和音频素材生成视频。接下来，我们将学习如何剪辑这些素材以制作视频。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## ✉️ 评价反馈\n",
    "感谢你学习阿里云大模型ACP认证课程，如果你觉得课程有哪里写得好、哪里写得不好，期待你[通过问卷提交评价和反馈](https://survey.aliyun.com/apps/zhiliao/Mo5O9vuie)。\n",
    "\n",
    "你的批评和鼓励都是我们前进的动力。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "learnacp",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
